Goto

Collaborating Authors

 majority function


Hardness of Learning Fixed Parities with Neural Networks

Shoshani, Itamar, Shamir, Ohad

arXiv.org Machine Learning

Learning parity functions is a canonical problem in learning theory, which although computationally tractable, is not amenable to standard learning algorithms such as gradient-based methods. This hardness is usually explained via statistical query lower bounds [Kearns, 1998]. However, these bounds only imply that for any given algorithm, there is some worst-case parity function that will be hard to learn. Thus, they do not explain why fixed parities - say, the full parity function over all coordinates - are difficult to learn in practice, at least with standard predictors and gradient-based methods [Abbe and Boix-Adsera, 2022]. In this paper, we address this open problem, by showing that for any fixed parity of some minimal size, using it as a target function to train one-hidden-layer ReLU networks with perturbed gradient descent will fail to produce anything meaningful. To establish this, we prove a new result about the decay of the Fourier coefficients of linear threshold (or weighted majority) functions, which may be of independent interest.


On the Trade-off between the Number of Nodes and the Number of Trees in a Random Forest

Akutsu, Tatsuya, Melkman, Avraham A., Takasu, Atsuhiro

arXiv.org Artificial Intelligence

In this paper, we focus on the prediction phase of a random forest and study the problem of representing a bag of decision trees using a smaller bag of decision trees, where we only consider binary decision problems on the binary domain and simple decision trees in which an internal node is limited to querying the Boolean value of a single variable. As a main result, we show that the majority function of $n$ variables can be represented by a bag of $T$ ($< n$) decision trees each with polynomial size if $n-T$ is a constant, where $n$ and $T$ must be odd (in order to avoid the tie break). We also show that a bag of $n$ decision trees can be represented by a bag of $T$ decision trees each with polynomial size if $n-T$ is a constant and a small classification error is allowed. A related result on the $k$-out-of-$n$ functions is presented too.

  Country:
  Genre: Research Report (0.64)
  Industry: Leisure & Entertainment > Sports (0.34)

Influence and Dynamic Behavior in Random Boolean Networks

Seshadhri, C., Vorobeychik, Yevgeniy, Mayo, Jackson R., Armstrong, Robert C., Ruthruff, Joseph R.

arXiv.org Artificial Intelligence

We present a rigorous mathematical framework for analyzing dynamics of a broad class of Boolean network models. We use this framework to provide the first formal proof of many of the standard critical transition results in Boolean network analysis, and offer analogous characterizations for novel classes of random Boolean networks. We precisely connect the short-run dynamic behavior of a Boolean network to the average influence of the transfer functions. We show that some of the assumptions traditionally made in the more common mean-field analysis of Boolean networks do not hold in general. For example, we offer some evidence that imbalance, or expected internal inhomogeneity, of transfer functions is a crucial feature that tends to drive quiescent behavior far more strongly than previously observed.


Neural Network Visualization

Wejchert, Jakub, Tesauro, Gerald

Neural Information Processing Systems

We have developed graphics to visualize static and dynamic information inlayered neural network learning systems. Emphasis was placed on creating new visuals that make use of spatial arrangements, sizeinformation, animation and color. We applied these tools to the study of back-propagation learning of simple Boolean predicates, and have obtained new insights into the dynamics of the learning process.


Asymptotic Convergence of Backpropagation: Numerical Experiments

Ahmad, Subutai, Tesauro, Gerald, He, Yu

Neural Information Processing Systems

We have calculated, both analytically and in simulations, the rate of convergence at long times in the backpropagation learning algorithm fornetworks with and without hidden units. Our basic finding for units using the standard sigmoid transfer function is lit convergence of the error for large t, with at most logarithmic corrections fornetworks with hidden units. Other transfer functions may lead to a 8lower polynomial rate of convergence. Our analytic calculations were presented in (Tesauro, He & Ahamd, 1989). Here we focus in more detail on our empirical measurements of the convergence ratein numerical simulations, which confirm our analytic results.


Asymptotic Convergence of Backpropagation: Numerical Experiments

Ahmad, Subutai, Tesauro, Gerald, He, Yu

Neural Information Processing Systems

We have calculated, both analytically and in simulations, the rate of convergence at long times in the backpropagation learning algorithm for networks with and without hidden units. Our basic finding for units using the standard sigmoid transfer function is lit convergence of the error for large t, with at most logarithmic corrections for networks with hidden units. Other transfer functions may lead to a 8lower polynomial rate of convergence. Our analytic calculations were presented in (Tesauro, He & Ahamd, 1989). Here we focus in more detail on our empirical measurements of the convergence rate in numerical simulations, which confirm our analytic results.


Neural Network Visualization

Wejchert, Jakub, Tesauro, Gerald

Neural Information Processing Systems

We have developed graphics to visualize static and dynamic information in layered neural network learning systems. Emphasis was placed on creating new visuals that make use of spatial arrangements, size information, animation and color. We applied these tools to the study of back-propagation learning of simple Boolean predicates, and have obtained new insights into the dynamics of the learning process.


Asymptotic Convergence of Backpropagation: Numerical Experiments

Ahmad, Subutai, Tesauro, Gerald, He, Yu

Neural Information Processing Systems

We have calculated, both analytically and in simulations, the rate of convergence at long times in the backpropagation learning algorithm for networks with and without hidden units. Our basic finding for units using the standard sigmoid transfer function is lit convergence of the error for large t, with at most logarithmic corrections for networks with hidden units. Other transfer functions may lead to a 8lower polynomial rate of convergence. Our analytic calculations were presented in (Tesauro, He & Ahamd, 1989). Here we focus in more detail on our empirical measurements of the convergence rate in numerical simulations, which confirm our analytic results.


Neural Network Visualization

Wejchert, Jakub, Tesauro, Gerald

Neural Information Processing Systems

We have developed graphics to visualize static and dynamic information in layered neural network learning systems. Emphasis was placed on creating new visuals that make use of spatial arrangements, size information, animation and color. We applied these tools to the study of back-propagation learning of simple Boolean predicates, and have obtained new insights into the dynamics of the learning process.


Scaling and Generalization in Neural Networks: A Case Study

Ahmad, Subutai, Tesauro, Gerald

Neural Information Processing Systems

The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to represent the inllUh and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, but as yet there are few rigorous results.In this paper we summarize a study Qf generalization in the simplest possible case-perceptron networks learning linearly separable functions.The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number ofuseful properties. We find that many aspects of.generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved.